Purpose

The purpose of this workshop is to give an example of a specific application of logistic regression analysis to phonetic categorisation data, using brms in R.

This workshop is not a full introduction to logistic regression, nor to Bayesian modeling, and we won’t be getting into a lot of the technical aspects.

You should see here, or other sources, for more details on this if you are interested or need more background.

This book, An Introduction to Bayesian Data Analysis for Cognitive Science by Bruno Nicenboim, Daniel Schad, and Shravan Vasishth, is also a great place to learn about the general principles, and section 4.3 is a good introduction to logistic regression.

Logistic regression

Logistic regression is a common way of modeling binary responses, which are very common for phonetic categorisation tasks.

A very important thing to keep in mind is that the units in logistic regression are log odds, they are a way of converting probabilities, which are bounded between 0 and 1, to an unbounded variable, which can be submitted to linear regression models (see the links above for more).

Most critically, this means that when you interpret the output from your models you must think in log odds. The coefficients for the fixed effects in your models will reflect how changing a particular factor changes the log odds of a particular response.

Thinking about your data

Before you do any modeling, its important to think about the data and how it relates to your research questions.

The example data and research questions

The data comes from an experiment where two groups of pre-school children listened to voice onset time (VOT) continua. They heard a token from the continuum and clicked on an image representing one member of a minimal pair.

The children were either monolingual English speakers/learners, or bilingual English/Spanish speakers/learners. There are 21 bilinguals and 26 monolinguals in this data set, and they each categorised 16 stimuli (8 VOT tokens at each place of articulation).

The research questions here are (in simplified form):

  • Do bilinguals and monolinguals perceive the stimuli differently? The prediction from the literature is that bilinguals will be influenced by Spanish, which has short-lag VOT for voiceless stops, unlike English. Therefore bilinguals will tend to hear these stimuli as voiceless.
  • Are the different places of articulation perceived differently? The data is not ideal for this question since the words are different not just the word-initial stop, but we’ll entertain this quesiton for illustrative purposes here.

Variables

You should identify what the relevant dimensions are for your data and analysis plans. In this case we have:

  • The VOT continuum, a numeric variable which ranged from 0-90 ms; represented as step 1-10, but excluding step 2 and 9 to be {1,3-8,10}.

  • The participant group, a categorical variable with two levels, bilingual or monolingual

  • The place of articulation for the stop (note that this also corresponds to different words/contexts for the stops)

Visualising the data

The essence of data like this is that it is binary - participants choose either picture (for example, toe or doe), and the can only choose one of two options. The plots below show the proportion of responses given to each VOT step.

A more common way to show categorisation data is to plot the proportion of one response type.

If we load in and look at the data, we can see that what is being plotted as a proportion on the y axis is the numerical ‘responsenum’ variable, which is 1 for voicless and 0 for voices responses.

data<-read.csv("data.csv")
head(data)
##     place step     group  ID responsenum response
## 1 coronal    1 bilingual P27           0      doe
## 2 coronal    3 bilingual P27           0      doe
## 3 coronal    4 bilingual P27           0      doe
## 4 coronal    5 bilingual P27           1      toe
## 5 coronal    6 bilingual P27           1      toe
## 6 coronal    7 bilingual P27           1      toe

Data analysis

First, we will load in the relevant packages.

library(brms);library(bayestestR)

Below I make the group variable a factor and then set bilinguals to be the reference level in the model. I also “scale” the continuum step. This is equivalent to z-scoring the value of 0 is now the mean of the data, and other numbers are represented as standard deviations away from that mean.

  • Centering continuum step here is really important for intpereting the intercept. If not, the intercept will be at VOT = step 0 - a not-real step number that participants never heard (!) since steps were numbered 1-10.
data$group <-as.factor(data$group)
# contrasts(data$group)<-c(-0.5,0.5) alternate coding scheme
contrasts(data$group)<-c(0,1)
data$step.scaled<-scale(data$step)
data$step.centered<-scale(data$step,scale=F) # another way, without z-scoring

I won’t say much about priors here - all you need to know is that these priors are not influecing the model much at all - they are weakly informative. If you want to pursue more advanced analyses with Bayesian modeling, you can learn about making reasonable prior choices and their possible impact on the data. This chapter is a good illustration of priors for logistic regression specifically.

library(brms)

priors_normal = c(prior(normal(0,1.5), class = Intercept),
                  prior(normal(0,1.5), class = b))

Alright let’s run the model - this should hopefully look generally familiar to you.

model1<-brm(formula=responsenum~step.scaled*group+
             (1+step.scaled|ID),
           prior = priors_normal,
           data = data,
           family = bernoulli(link = "logit"),
           file="model1",
           iter = 4000, 
           warmup = 1000,cores=4,chains = 4)

One part that may be new to you is (1+step.scaled|ID). These are so-called random effects.

  • ID refers to each individual participant and is called a random intercept. This effectively tells the model that there is more structure to the data - i.e. different observations come from a cluster: participant in this case. It’s important to include this in models, since if not, the model may be over-confident when estimating group differences.

  • step.scaled is the same variable as in the fixed effects and it is called a random slope. What it does is allow the model to account for the fact that individuals may vary in how they perceive the continuum. The general advice is that any fixed effect which participants see multiple levels of should be a random slope. For this reason the group variable is not a random slope since each participant is part of only one group (i.e., there no participant who has some data which is “bilingual” and some data which is “monolingual”).

Having a conceptual understanding of the above is a good start, but to get more details and a fuller undertanding I recommend this chapter.

Interpreting the model

Once you’ve fit your model you have to reason about the results. As you will know, the results from Bayesian modeling is a posterior distribution over parameter values. There are different ways to reason about these, but we will talk about two here: credible intervals and ‘probability of direction’. First lets plot and interpret the full posterior distributions from fixed effects.

Visualising the model estimates and predictions

plot(model1,variable = "^b_", regex = TRUE)

We can also plot the predictions from the model. These are given as the estimated mean and 95% credible intervals for a voiceless response based on variation in the fixed effects. This is a really good way to see what your model is doing. Here are the two fixed effects. Conveniently, these are converted to probabilities from log-odds automatically.

conditional_effects(model1,effects=c("group","step.scaled"))

And we can plot something that resembles a categorisation function by plotting the interaction of the two fixed effects.

conditional_effects(model1,effects=c("step.scaled:group"))

Numerical summaries of the posterior

Finally, let’s look at the summary of the model, the part we are interested is the Population-Level Effects.

summary(model1)
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: responsenum ~ step.scaled * group + (1 + step.scaled | ID) 
##    Data: data (Number of observations: 752) 
##   Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
##          total post-warmup draws = 12000
## 
## Group-Level Effects: 
## ~ID (Number of levels: 47) 
##                            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## sd(Intercept)                  0.60      0.20     0.20     1.00 1.00     3154
## sd(step.scaled)                0.86      0.31     0.22     1.50 1.00     2773
## cor(Intercept,step.scaled)     0.39      0.34    -0.41     0.92 1.00     3394
##                            Tail_ESS
## sd(Intercept)                  3226
## sd(step.scaled)                2562
## cor(Intercept,step.scaled)     4620
## 
## Population-Level Effects: 
##                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept              1.00      0.22     0.58     1.46 1.00     7136     7275
## step.scaled            2.43      0.34     1.82     3.14 1.00     4467     4666
## group1                -0.60      0.28    -1.15    -0.06 1.00     7941     7122
## step.scaled:group1    -0.03      0.40    -0.83     0.76 1.00     6346     6837
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Credible intervals

The posterior distributions are totally continuous, but often researchers consider (somewhat arbitrary) cutoffs to decide if an effect is robust, or as commonly referred to in the Bayesian sphere, “credible”. A common way is to consider 95% credible intervals, which is the range in which 95% of an estimate falls.

In the above model summary, we can see the median estimate and 95% CrI for an effect- take a second to relate these back to the distributions for the intercept and fixed effects above.

  • When 95% CrI exclude the values of 0, this means that the model is consistently estimating a non-zero value.

  • What does this mean for the intercept? Remember we’re talking about log-odds.

  • What does this mean for fixed effects? Think about both continuum step, and group.

Probability of direction

We can also summarise the posterior by showing the percentage that has a particular directionality - this is often called “probability of direction”.

pd(model1)
## Probability of Direction
## 
## Parameter          |     pd
## ---------------------------
## (Intercept)        |   100%
## step.scaled        |   100%
## group1             | 98.38%
## step.scaled:group1 | 52.86%
  • A posterior distribution centered exactly on zero would have 50% of the distribution on each side, and so would have a probability of direction = 50. Does this seem like good evidence for an effect?

  • When 95% CrI just barely exclude zero, probability of direction = 97.5 (think about why). Does this seem like good evidence for an effect?

  • When all of a posterior distrubtion sits on one side of zero, then probability of direction = 100. Does this seem like good evidence for an effect?

  • Question for you: How would you reason about a fixed effect where probability of direction = 92?

Take a look at the pd values above, and relate them to both the model summaries and the fixed effect estimate distributions.

Reporting

Here is one fairly standard way to report an effect.

“The model estimates a credibly non-zero difference between groups, wherein the monolingual group shows a relative decrease in the log-odds of a voiceless response in comparison to the bilingual reference level (\(\hat\beta\) = -0.60, 95%CrI = [-1.15,-0.06], pd = 98). On this basis we conclude that there is convincing evidence for a group difference: bilingual children are more likely to perceive the stimuli as voiceless /p/ and /t/. However, the estimated magnitude of the effect is variable, and may be quite small.”

Practice

Build a new model to examine the place of articulation/word effects, and their interaction with the group variable and step variable.

You should answer the following questions: is there evidence for an effect of place of articulation on the probability of voiceless responses? Does this interact with the group variable or the continuum step variable?

  • How will you code the place variable?
  • What will you add as random effects and fixed effects to the model code above?
  • How will you interpret the model output to answer the questions in bold above?

A possible answer to the practice problem

Coding the contrasts. I decided to treatment code place the intercept will be for bilabial stops.

data$place <-as.factor(data$place)
contrasts(data$place)<-c(0,1) 

Running the model; note that place is a random slope.

model2<-brm(formula=responsenum~step.scaled*group*place+
             (1+step.scaled*place|ID),
           prior = priors_normal,
           data = data,
           family = bernoulli(link = "logit"),
           file="model2",
           iter = 4000, 
           warmup = 1000,cores=4,chains = 4)

Let’s look at the estimates and conditional effects…

plot(model2,variable = "^b_", regex = TRUE)

conditional_effects(model2,effects=c("group","step.scaled","place"))

Here we can plot conditional effects for place, group and step.scaled; we have to do something extra here and make the conditions

place.conditions<- make_conditions(model2, "place")
conditional_effects(model2, "step.scaled:group", conditions = place.conditions)

And finally, let’s look at the numerical summaries.

summary(model2);pd(model2)
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: responsenum ~ step.scaled * group * place + (1 + step.scaled * place | ID) 
##    Data: data (Number of observations: 752) 
##   Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
##          total post-warmup draws = 12000
## 
## Group-Level Effects: 
## ~ID (Number of levels: 47) 
##                                     Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(Intercept)                           1.03      0.28     0.52     1.62 1.00
## sd(step.scaled)                         0.81      0.36     0.12     1.56 1.00
## sd(place1)                              1.13      0.44     0.21     1.99 1.00
## sd(step.scaled:place1)                  1.53      0.65     0.32     2.89 1.00
## cor(Intercept,step.scaled)              0.34      0.30    -0.35     0.83 1.00
## cor(Intercept,place1)                  -0.57      0.28    -0.92     0.15 1.00
## cor(step.scaled,place1)                -0.23      0.34    -0.81     0.51 1.00
## cor(Intercept,step.scaled:place1)       0.16      0.33    -0.50     0.77 1.00
## cor(step.scaled,step.scaled:place1)     0.34      0.35    -0.42     0.90 1.00
## cor(place1,step.scaled:place1)          0.20      0.35    -0.57     0.79 1.00
##                                     Bulk_ESS Tail_ESS
## sd(Intercept)                           3044     4361
## sd(step.scaled)                         2233     2239
## sd(place1)                              2200     1974
## sd(step.scaled:place1)                  3212     3324
## cor(Intercept,step.scaled)              5779     7273
## cor(Intercept,place1)                   4297     4184
## cor(step.scaled,place1)                 5297     7209
## cor(Intercept,step.scaled:place1)       6150     7166
## cor(step.scaled,step.scaled:place1)     3474     4169
## cor(place1,step.scaled:place1)          5815     7324
## 
## Population-Level Effects: 
##                           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept                     1.17      0.33     0.54     1.85 1.00     6247
## step.scaled                   2.07      0.38     1.40     2.88 1.00     5349
## group1                       -0.75      0.41    -1.57     0.06 1.00     7103
## place1                       -0.04      0.43    -0.88     0.81 1.00     8180
## step.scaled:group1           -0.06      0.44    -0.91     0.81 1.00     7307
## step.scaled:place1            2.10      0.71     0.82     3.62 1.00     7844
## group1:place1                 0.26      0.54    -0.82     1.32 1.00     8434
## step.scaled:group1:place1     0.32      0.76    -1.15     1.84 1.00    12572
##                           Tail_ESS
## Intercept                     7386
## step.scaled                   6972
## group1                        7469
## place1                        8308
## step.scaled:group1            7939
## step.scaled:place1            8019
## group1:place1                 8641
## step.scaled:group1:place1     9463
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
## Probability of Direction
## 
## Parameter                 |     pd
## ----------------------------------
## (Intercept)               | 99.99%
## step.scaled               |   100%
## group1                    | 96.58%
## place1                    | 53.48%
## step.scaled:group1        | 55.79%
## step.scaled:place1        | 99.93%
## group1:place1             | 69.13%
## step.scaled:group1:place1 | 66.00%

Some interpretation

You can see that there is a credible 2-way interaction between step.scaled and place. As we can see this indicates that the slope of the categorisation function is shallower for the bilabial place of articulation. The positive sign of the interaction indicates a greater change in the log-odds of a voiceless response along the continuum when changing from the reference level bilabial to coronal stops. You should think about the interpretation for the other effects as well!